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ABSTRACT 

This  study  demonstrated  that  item  response  theory,  latent  class 
models,  and  the  rule  space  model  introduced  by  Tatsuoka  (1985)  and 
Tatsuoka  and  Tatsuoka  (1987)  are  algebraically  related.  Specifically, 
it  was  shown  (1)  that  IRT  functions  may  actually  be  regarded  as  the 
conditional  density  functions  of  item  scores  for  a  special  latent  class 

/■  r 

representing  the  null  state  of  knowledge  (i.e.,  the  state  that  would 

<3/  9-  ' 

^ideally  ^produce  a  response  vector  of  all  zeros);  and  (2)  that 
estimates  of  the  item  parameters  of  IRT  functions  can  be  determined  from 
the  union  of  several  latent  classes  with  the  following  property:  when 
their  response  vectors  are  mappped  into  rule  space,  the  centroids  of 
these  projections  lie  approximately  along  the  first  principal  axis  of 
the  union  set. 

Bug  distributions,  which  are  density  function  of  the  numbers  of 
slips  away  from  the  ideal  rule-generated  response  patterns,  play  an 
important  role  in  interrelating  IRT  and  latent-class  models;  they  in 
fact  hold  the  key  to  the  development  of  a  general  theory  of  rule  space 
that  Includes  these  two  models  as  special  cases.  Furthermore,  bug 
distributions  form  the  basis  for  developing  new  indices  that  measure  the 
stability  of  states  or  rules  and  the  consistency  with  which  a  particular 


rule  is  applied  with  no  intrusion  of  slips. 
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Recent  advances  in  cognitive  theory  provide  new  insights  into  human 
thinking  and  learning  processes.  Linn  (1985)  pointed  out  that  it  is 
important  to  develop  a  new  theory  and  measurement  technique  in  order  to 
measure  volatile  learning  activities  described  in  Glaser  (1985)  and 
assess  cognitive  skill  acquisition.  Glaser  (1985)  summarized  the  main 
objectives  of  assessing  new  achievement  measures  into  four  categories: 

1)  Diagnosing  the  principles  of  performance:  2)  Assessing  the  theory 
changes:  3)  Evaluating  a  structure  or  representation  of  problems:  and 
4)  Assessing  the  automaticity  of  performance  skills.  A  modern 
measurement  theory  must  be  developed  by  taking  the  four  objectives  into 
account.  This  implies  that  it  is  necessary  to  establish  a  concept  of 
item  construction  that  is  different  from  the  classical  foundation. 

Traditionally,  item  construction  originated  from  the  evaluation  of 
content  validity — how  a  test  covers  subject  matter  and  situations. 

Again,  Glaser  (1985)  suggested  that  test  items  could  be  comprised  of  two 
elements — "information  that  needs  to  be  known  and  information  about  the 
conditions  under  which  use  of  this  knowledge  is  appropriate."  As  for 
the  former  element,  there  are  various  stages  of  competence  in  students 
knowledge,  including  cognitive  skills.  Also,  it  is  important  to  assess 
what  knowledge  structure  the  students  have.  Greeno  (1980)  pointed  out 
that  the  acquisition  of  declarative  and  procedural  knowledge  is  usually 
an  objective  of  instruction,  but  that  strategic  knowledge  that  enables 
one  to  set  goals  and  subgoals  and  to  form  plans  for  attaining  goals  is 
not  explicitly  taught.  Different  item  types  often  require  the  students 
to  decide  which  solution  path  should  be  taken,  and  what  should  be  done 
first,  to  reach  the  final  answer. 


Many  erroneous  rules  discovered  in  past  research  (Brown  &  Burton, 
1978;  VanLehn,  1983;  Tatsuoka  &  Tatsuoka.  1981)  indicated  many  erroneous 
rules  originated  from  a  lack  of  the  strateg/c  skill  described  by  Greeno. 
Therefore,  new  measurements  must  include  the  information  for  prescribed 
diagnosis  of  students’  erroneous  rules  or  sources  of  misconception.  The 
new  test  design  must  be  capable  of  reflecting  and  discriminating  between 
the  different  knowledge  structures  possessed  by  individuals.  Each 
structure  requires  its  unique  strategies  to  set  subgoals  and  goals  and 
to  find  solution  paths.  As  a  result,  different  structures  produce 
different  sets  of  erroneous  rules;  some  rules  may  be  included  in  both 
the  structures  but  the  others  are  included  in  just  one  of  them.  The 
modern  measurement  theory  must  be  able  to  discriminate  one  knowledge 
structure  from  another .  The  third  condition  ’’theory  change"  is  stated 
as  "hypotheses  testing."  When  learning  takes  place,  students  test  their 
hypotheses  and  then  evaluate,  examine,  and  modify  current  theories  on 
the  basis  of  new  information.  It  is  not  unusual  that  many  students 
change  their  erroneous  rules  one  to  another  before  reaching  the  mastery 
stage.  Measurements  of  new  kinds  of  tests  must  capture  the  traces  of 
these  performance  changes  in  detail  in  order  to  increase  educational 
utility  of  responses  to  the  test.  The  goals  to  be  attained  in  modern 
measurement  theory  are  not  easy.  Apparently,  the  technical  barriers,  as 
Linn  states,  are  high  and  the  traditional  theories  of  educational 
measurement  and  testing  have  only  limited  power,  or  are  simply 
inapplicable  to  the  new  measures. 

In  this  paper,  the  pros  and  cons  of  two  representative  test  models. 
Item  Response  Theory  (Lord  &  Novick,  1968)  and  Latent  Class  (Lazarsfeld 
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&  Henry,  1968)  will  be  discussed  with  respect  to  the  demands  of  modern 
measurement  theory  and  their  interrelationships  with  rule  space  will  be 
discussed.  Discussion  will  be  focussed  on  their  modeling  assumptions, 
and  conditional  density  functions  of  latent  rules  (classes  or  groups). 

It  will  also  be  shown  that  IRT  becomes  a  special  case  of  latent  rule 
(classes  or  groups). 

A  cluster  of  response  patterns  around  Rule  R 

If  a  student  applies  his/her  erroneous  rule  with  perfect 
consistency  to  the  items  in  the  test,  then  his/her  responses  to  the  test 
will  be  perfectly  matched  with  the  responses  generated  by  a  computer 
program.  We  call  such  systematic  errors  erroneous  rules  or  rules.  A 
correct  rule  will,  by  definition,  produce  the  right  answer  to  all  the 
items.  Although  wrong  rules  sometimes  may  produce  the  right  answer  to 
some  subset  of  the  test  items,  it  is  very  unlikely  that  they  will 
produce  the  right  answer  to  all  the  items.  We  further  assume  that  the 
test  items  are  carefully  constructed  so  that  the  important,  predicted 
common  errors  can  be  expressed  by  unique  item  response  pat terns  of  ones 
and  zeros.  Therefore,  rule  R  can  be  represented  by  a  binary  vector 
R  =  (Tj.Tg. • • *r  ).  However,  actual  students  performances  on  the  test 
items  are  unlikely  to  be  perfectly  consistent  and  are  subject  to  random 
errors  or  slips  due  to  carelessness  or  uncertainty  that  always  affect 
the  outcomes  of  performances  on  a  test.  Even  if  a  student  possesses 
some  systematic  error,  it  is  rare  to  have  the  response  pattern  perfectly 
matched  with  the  pattern  theoretically  generated  by  its  algorithm 
(VanLehn,  1983;  Tatsuoka,  1984).  Some  systematic  errors  may  have  a 
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tendency  to  produce  more  slips  while  other  rules  produce  fewer  slips. 
Some  items  may  be  prone  to  produce  more  slips  than  other  items.  Thus, 
it  would  not  be  realistic  to  assume  that  all  the  items  have  equal  slip 
probabilities. 

Bug  Distributions 

Tatsuoka  &  Tatsuoka  (1987)  derived  the  theoretical  distribution  of 

observed  slips,  and  called  it  "bug  distribution."  First,  the 

probability  of  having  a  slip  on  item  j  (j=l,2 . n)  is  denoted  by  p. 

J 

for  item  j 

(1)  Pj|R  =  Prob  (having  a  slip  on  item  j |R)  =  Prob  (u^  =  1 |R) 


where  u.  is  a  random  variable  such  that  u.  =  1  if  a  slip  occurs  on  item 
J  J 

j  and  u  s  0  if  not,  and  Rule  R  is  a  vector  R  =  (r^.r^ . r^) . 

u  =  1  ifaslip  occurs  (i.e.,  if  x.  ^  r.) 

J  J  J  J 


Uj  =  0  otherwise 


(i.e.,  if  x .  =  r .) 

J  J 


More  succinctly,  u.  may  be  defined  as 
J 

(2a)  uj  -  lrj  -  *j 1 

Given  the  reasonable  assumption  that  slips  occur  independently  across 
items,  the  bug  distribution  of  rule  R  follows  a  compound  binomial 
distribution  with  different  slip  probabilities  for  the  items 


Prob  (having  up  to  s  slips | R)  = 


s  n  u . 

I  I  BP,  J(l-P  ) 

m=0  2u.=m[j=l  **  ^ 

J 


Since  R  can  be  any  rule,  the  number  of  slips  from  the  correct  rule 


R  =  (1,1,...  1)  =  1^  also  follows  a  compound  binomial  distribution  with 
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slip  probabilities  Pj|^  =  Prob(Uj  =  Oj|l_).  j  =  If  the  elements 

of  Rule  R  are  zeros,  0  =  (0,0,... 0).  then  the  slip  probabilities  from 
this  wrong  rule  will  be  given  by  Pj |q  =  Probfu^  =  1 |0) .  j  =  1 . n. 

Property  1.  The  slip  probabilities  of  the  bug  distribution  are 
determined  by  the  logistic  function  of  Item  Response  Theory.  The  slip 
probabilities  p  ip  are  given  by  Equation  4 


(4) 


P*|R  =  Prob(u,  =  1  ;R)  =  Prob(x.  *  r.;R)  =  r4Q4(0n)  +  (1  -  r4)P4(0o) 


rjv“RJ 


jv  RJ 


where  x  is  the  observed  score  of  item  j  and  P.(0_)  is  the  IRT  function 
J  J  K 

at  the  0  level  associated  with  rule  R. 

Suppose  Rule  R  corresponds  to  a  vector  R  =  (1111000),  the  first 

four  elements  being  ones  and  the  others,  are  zeros.  The  random  variable 

u.  will  be  1  if  a  slip  from  r.  occurs  and  zero  if  not.  If  a  student's 
J  J 

performance  on  the  seven  items  results  in  a  response  pattern  of  two 
slips  away  from  R,  then  two  items  have  different  values  from  the 
elements  of  vector  R.  Suppose  the  two  slips  occurred  on  items  1  and  7, 
then  the  corresponding  response  pattern  will  be  x  =  (0111001).  The 
middle  member  of  4  can  be  rewritten  as  follows: 


(5)  If  r.  =  1,  then  Prob(x .  ^  r  ;R)  =  Prob  (x.  =  0;R) 

J  J  J  J 

If  r  =  0.  then  Prob(x  *  r.;R)  =  Prob(x.  =  1 ; R) 

J  J  J  J 

It  is  known  that  the  probability  of  score  1  for  item  j  is  the  logistic 
function  Pj(0R)>  Prob(Xj  =  1 ; R)  =  Pj(0^),  thus  Equation  5  can  be  written 
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in  the  single  Equation  4,  viz. : 


prob  (Xj  X  rj ’R)  =  r JQJ < 0R>  +  ('  -  rj)Pj(9R>- 
From  Equations  2,  3,  4.  and  5.  the  slip  probability  is  given  by  a 


weighted  mean  of  r.  and  1  -r..  as  follows; 

J  J  J 


(6>  = 


Alternatively,  to  simplify  the  notation,  we  may  write 


Pj  |R  "  'rj  "  W' 


Equation  (j)  shows  that  any  slip  probability  p.  is  a  function  of  0.  In 

J 


order  to  emphasize  this  fact,  p  will  be  denoted  by  S.(0)  hereafter. 

J  j 


The  conditional  distribution  function  of  number  of  slips  given  by 


Equation  1  will  be  rewritten  in  terms  of  Sj(0),  in  Equation  7  by 


replacing  p  by  S  (0),  and  Equation  8  is  the  generating  function  of 

J  J 


expression  (7).  (We  omit  the  subscript  R  in  0^  as  understood.) 


Prob  (having  up  to  s  slips  from  R)  = 


s  n  u . 

2  <  2  U  S.(0)  J(1 

m=0  [  2u  ,=m  j=l  3 
J 


Sj(0)) 


(8)  g(0;R)  =  H  (S  (0)  +  (1  -  S  (0))} 

j=l  3  3 


The  expectation  and  variance  of  the  number  of  slips  from  rule  R  are 


given  by  (9)  and  ( 10) , 


(9)  fA,  =  2  S  (0)  =  2  P  (0)  +  2  Q . ( 0 ) 

K  j=l  J  r  .=0  3  r  =1  J 

J  J 


(10)  =  2  S  (0)(1  -  S  (0))  =  2  P  (0)Q  (0)  +  2  Q.(0)P.(0). 

j=l  J  J  Tj=0  3  3  rj=0  J  J 
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A  Measure  of  Rule  Stability 

The  expectation  and  variance  given  by  equations  (9)  and  (10)  are 

not  those  of  the  total  score  as  is  customary  in  conditional  density 

functions  of  latent  rules  or  classes;  rather,  they  refer  to  the  number 

of  slips.  The  expectation  of  the  number  of  slips  from  a  rule  is  a 

measure  of  the  instability  of  rule,  since  the  expectation  represents  the 

average  number  of  slips  from  that  rule,  and  the  variance  is  a  measure  of 

the  extent  to  which  the  number  of  slips  made  varies  from  student  to 

student.  For  example,  for  the  erroneous  rule  producing  all  wrong 

answers,  expressed  by  the  vector  0  =  (0,0 . 0) ,  the  expectation 

n 

2  P.(0n)  will  be  very  small  because  the  values  of  P.(0n),  j=l . n  are 

j=l  J  J  U 

very  small  —  nearly  zero.  Therefore,  Rule  0  is  very  stable  and  slips 

rarely  occur.  At  the  same  time,  the  right  rule  1^  =  (1,1 . 1)  has  the 

n 

expectation  of  2  Q.(0..)  which  also  is  very  small.  We  can  conclude  that 

j=l  J 

any  students  who  are  in  the  state  of  mastery  can  execute  the  right  rule 
systematically  and  the  probability  of  having  any  slip  deviating  from  the 
right  rule  is  very  small.  In  general,  the  mean  number  of  slips  from  1 
to  0  will  be  2  .S.(0)  and  the  mean  number  of  slips  from  0  to  1  will  be 

rj=i  j 

n 

2  S  (0).  The  expected  number  of  slips  will  be  2  S.(0). 
iyo  J  j=i  J 

Now  let  us  consider  a  rule  R  whose  elements  are  about  half  ones  and 

half  zeros.  Then  the  probability  of  having  slips  will  be  close  to  .5, 

(Tatsuoka,  1986),  which  implies  that  such  rules  have  a  50%  chance  of 

having  slips  away  from  its  perfect  execution.  Moreover,  since  the 

conditional  expectation  of  the  bug  distribution  is  larger  around  the 
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mid-range  of  9  (Tatsuoka,  1986)  the  number  of  slips  expected  to  occur 
there  will  be  fairly  large.  This  means  that  rules  used  by  many  average 
ability  students  will  tend  to  have  more  slips  than  the  rules  that  very 
high  or  very  low  ability  students  are  likely  to  use,  and  the  stability 
of  such  rules  is  lower  because  the  probability  of  having  slips  is  higher 
for  rules  espoused  by  high  or  low  ability  students. 

A  Measure  of  Rule  Consistency 

The  expectation  of  the  number  of  slips  is  given  in  Equation  9, 
which  can  be  regarded  as  a  measure  of  how  stable  this  rule  is.  The 
variance  given  in  Equation  10  represents  the  dispersion  or  spread  of 
number  of  slips. 

However,  consistency  of  a  rule  r  is  a  different  concept.  If  a 
student  uses  rule  r  with  perfect  consistency  then  the  resulting  response 
pattern  matches  the  binary  pattern  generated  by  a  logically  programmed 
algorithm  for  rule  R.  Therefore,  the  probability  of  not  having  any 

slips  can  be  obtained  by  setting  u.  =  0  for  j  =  l,...,n. 

J 

n 

(11)  Prob  (perfect  execution  of  rule  R)=  IT  ( 1  —  S . (0) ) . 

j=l  J 

The  probability  obtained  from  Equation  11  is  an  index  of  consistency  and 
represents  the  probability  of  systematic  execution  of  rule  R.  However, 
the  value  of  the  consistency  defined  in  this  manner  will  be  extremely 
small  as  the  number  of  test  items  becomes  large.  The  consistency 
measure  must  be  independent  of  the  test  length.  The  most  plausible 
candidate  for  the  consistency  index  is 


°R  ■  O  - 
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which  is  the  geometric  mean  of  the  probabilities  of  not  having  slips  on 
items  l,2,...,n  —  i.e.,  of  the  n  factors  of  the  right-hand  member  of 
Equation  11. 

Relationship  Between  Bug  Distribution  and  Item  Response  Theory  Model 


Bug  distribution  was  formulated  by  taking  the  notion  of  slips  and 
slip  probabilities.  It  was  assumed  that  the  occurrence  of  slips  was 
independent  across  the  items  and  the  probability  of  a  slip  occurring  for 
each  item  was  denoted  by  Pj  j=l.....n.  Each  item  j  has  its  unique 


chance  of  having  a  slip  away  from  r.  and  different  values  of  slip 

J 


probabilities  are  assumed  across  the  items.  Then,  the  probability  of 
having  some  finite  number  of  slips  away  from  Rule  R  was  given  by 


Equation  3  with  the  slip  variable  u.  defined  by  Equation  2  or  2a. 

J 


Derivation  of  the  compound  binomial  distribution  (3)  is  applicable  to 
any  rules — where  a  set  of  response  patterns  resulting  from  inconsistent 
application  of  rule  R  was  introduced  as  a  cluster  around  rule  R  in 
Tatsuoka  &  Tatsuoka  (19S7)  and  was  denoted  by  {R} .  When  an  erroneous 
rule  produces  wrong  answers  for  all  the  items  in  a  test,  it  corresponds 
to  the  null  vector,  0  =  (0,0,... ,0).  In  this  case  the  random  variable 


u.  is  the  same  thing  as  the  random  variable  of  item  score  x.,  so 
J  j 


Prob(u .  =  1 |0)  =  Prob(x  .  =  1 | 9q)  where  0~  is  the  latent  ability  level  of 

J  J  V/ 

rule  0,  and  the  slip  probabilities  of  n  items  become  the  logistic 


functions  P.(0)  of  IRT  models.  Therefore,  it  can  be  said  that  the  IRT 
J 


model  is  equivalent  to  the  latent  class  model  associated  with  the  rule 
that  produces  the  null  vector,  0.  The  likelihood  functions  of 
rule-0-latent  class  and  IRT  are  as  follows: 


#.«  4J| 


(13)  l(o)  =  n  n  p  (0)  J(1-  P  (0))  j 
i€{0>  j  [ J  J 


N  n  x . 

(12)  L(IRT)  =  IT  IT  P.(0)  J(l-  P.(0)) 
1=1  j=l  J  J 


where  N  is  not  the  number  of  subjects  belonging  to  the  cluster  of  latent 
rule  0,  (0),  but  is  simply  the  sample  size. 

A  sample  whose  response  patterns  are  well  described  by  the  IRT 
model  (i.e.  an  "IRT  sample")  may  contain  clusters  around  many  rules, 
including  the  cluster  around  the  correct  rule  K  Let  R  be  one  of  many 
rules  contained  in  the  IRT  sample.  Then,  the  likelihood  of  the  bug 
distribution  associated  with  Rule  R  is 

(15)  l(r)  =  n  ff  |s  (0)Uj(i-  s  (0))1  Uj|. 

i€{R}  j=l [  J  J  J 

The  relationship  between  the  two  variables  u  and  x  was  already  given  in 

Equation  2a,  viz.,  u.  =  jr.  -  x  |.  Also,  the  slip  probability  of  S.(0) 

J  J  J  J 

is  given  by  Equation  6a  with  now  rewritten  as  Sj(0).  Substituting 

u  and  S.(0)  from  these  two  relations,  Equation  16  is  obtained, 

J  J 


r .  -  x. 


(16)  L(R)  =  17  II  |r .  -  P  .(0.)  |  j  j  (l-|r  -P(0)|) 

i€{R}  j=l  J  J  1  J  J  i 


Separating  the  multiplication  over  j  into  those  factors  for  which  r.  =  1 

J 

and  those  for  which  r^  =  0,  we  get 


(1  -  |r  -  x  |) 

|  J  J 
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L(R)  =  n  { 

i€{R} 


1  -  xii  1  -  (1  -  X..) 

.  n  [i  -  wi  [i  -  (i  -  p ab  m  J1 

Lpr  =1  J  J  1 


*  n  I-  Xji  [1  -  (1  -  I  -  P--(Q.- ) 1 1  ~  1  "  Xji 

jDr ^=0  J  1  J  1 


or 


(17)  L(R)  =  U  4  77 


i€{R}  Lr  .=1 


x . 


P,(0)  jiQ,(0) 

r  .=0  J  J 

J 


1-x.. 

Ji 


Therefore  expression  (17)  becomes  the  same  as  the  conventional 


likelihood  expressed  in  terms  of  the  score  variable  x.  and  IRT  function 


P j ( 0 )  uPon  combining  the  r^  =  1  and  r  =  0  cases.  Thus,  Equation  18  is 
obtained; 


n  f  x  1-x.. 

(18)  L(R)  =  IT  n  jp.(0)  J1Q.(0)  ^ 

i€(R}  j=l [  J  J 


Equation  18  is  exactly  the  likelihood  of  latent  class  R  which  is 
referred  to  as  the  cluster  around  rule  R  in  this  paper.  That  is. 


(19) 


L(R)  =  n  n  \s .(0)Uj(i-  s.(0))A  u4  =  rr  n 
i€(R}  j  I  J  i  J  i€{R)  j 


1-U." 


X  . 

Ji 


Pj(0)  J(1-  Pj(0)) 


1-x. 

J 


Suppose  a  sample  (IRT)  that  fits  the  IRT  model  well  contains  K  +  2 
latent  classes  or  the  clusters  around  K  rules  besides  Rule  0  and  Rule  1 

Ay  * 

denoted  by  {0} , {R^} , (Rg) . . . (R^} , (1J  ,  then  the  IRT  sample  must  be  the 


union  set  of  K  +  2  latent  classes  of  R, 

(20) 


K+l 


(IRT)  =  {0}  U  (R  }  U  (R  }...U{R  }  U  {1}  =  U  (R } . 

k=0  K 
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By  avoiding  to  count  subjects  who  may  belong  to  the  gray  area  between 
two  clusters,  and  taking  memberships  in  the  clusters  to  be  mutually 
exclusive,  we  conjecture  that  the  likelihood  of  IRT  model  can  be  given 
by  the  equation  below. 


(21)  L(IRT)  =11  IT 
K+l  j=l 

ie  u  {4} 

1.  A 


n  f  x.  1-x. 

n  jp.(e)  J(i-  p.(e))  J 


N  n  f  x .  1-x . 

=  n  n  {p  (0)  J(i-  p . ( 0 ) )  J 

i=l  J=1  l  J  J 


An  assumption  required  in  IRT  models  is  the  unidimensionality  of  data. 


Therefore  the  union  of  K  +  2  sets  expressed  in  (20)  must  satisfy  the 


unidimensionality  condition  in  order  to  yield  estimates  of  logistic 


parameters.  The  most  intuitive  explanation  for  the  union  of  K  +  2 


clusters  to  become  unidimensional  is  that  their  centroids  are  located  on 


the  principal  axis  of  the  IRT  sample  and  the  first  eigenvalue  is 


considerably  larger  than  the  others.  Moreover,  each  class  (R,  }  follows 


the  compound  binomial  distribution  given  by  Equation  7  with  the  slip 


probabilities  given  in  Equation  9.  By  rewriting  the  bug  distribution  in 


the  form  of  a  conventional  conditional  density  function  using  the 


relation  given  in  Equation  19,  each  rule  R^  is  seen  to  have  the 


n  r .  1  r  -i 

likelihood  IT  P  (0_)  J(1  -  P.(0))  .  which  indicates  the  frequency 

j=l  j  R  j 


with  which  rule  R^  is  chosen  by  students,  and  is  hence  a  measure  of  the 


popularity  of  rule  R^  among  students.  Let  us  denote  the  stability  and 
consistency  of  rule  R^  by  S^,  C^,  resPect*ve^ ’  t^ien  each  rule  R^ 


is  characterized  by  the  values  of  its  stability  and  consistency  . 


«  ,1 


The  conditional  density  function  of  a  given  latent  class  R  and  the  bu 


distribution  of  Rule  R. 


In  the  previous  section,  it  was  shown  that  IRT  could  be  regarded  as 


a  special  case  of  latent  classes,  and  that  the  conditional  probability 


function.  Pj(0),  of  IRT  model  becomes  identical  to  the  bug  distribution 


of  Rule  0.  The  relation  was  given  by  Equations  13  and  14.  Moreover, 


the  likelihood  function  of  any  rule  R,  written  in  terms  of  slip  variable 


u.  and  slip  probability  S  (0)  in  Equation  15,  can  be  rewritten  by  using 
J  J 


the  score  Xj,  of  item  j  and  the  IRT  conditional  probability  function 


Pj(0)  in  Equation  18.  .  The  slip  variable  Uj  can  be  transformed  into  the 


item  score  variable  Xj  by  a  relation  parallel  to  Equation  2a.  viz.: 


(22)  xj  -  rj(1  '  uj>  +  (1  '  rj)uj  '  |rj  -  ujl 


Therefore,  the  total  score,  2  x.  depends  on  the  elements  r.  of  rule  R, 

j=l  J  J 


and  becomes 


2  x  =  2  r  (1  -  u  )  +  2  (1  -  r  )  u  =  2  r  -  2  u  +  2  u 


j=l  J  j=l  J 


•  J  «  .  —  1  ,  4-i  u  .  u  . 

J  J  j=l  J  r,=l  J  r,=0  J 


that  is,  the  sum  of  1  -  u  over  the  items  for  which  r.  =  1  plus  the  sum 

J  J 


of  u.  over  the  items  which  r.  =  0  or,  equivalently,  2  r.,  the  total 
J  J  h — i  J 


score  of  R  plus  the  difference  between  the  number  of  slips  0  -*  1  and 


1  -»  0.  By  taking  the  expectation  of  and  u^  of  Equation  22  for  a 


given  0,  Equation  24  is  obtained. 


(24)  e  (x,  =  1 1 0 )  =  R.  -  R.  e  (u  =  l|0)  +  (1  -  R.)  e  (u  =  1  |0) 

k | j  k  J  Jk|j  k  J  k|j  k 


Replacing  t  (x,  =  1 | 0)  by  P.(0)  and  e  (u,  =  1 | 0)  by  S  (0),  Equation  24 
k | j  k  J  k|j  k  J 

can  be  rewritten  as  Pj (0)  =  Rj  -  RjSj(0)  +  (1  -  Rj)Sj(0).  Summing  over 
j  from  1  through  n,  and  expressing  S.(0)  by  P.0)  or  Q.(0)  as 

v  J  J 

appropriate,  the  following  equation  is  obtained: 
n 


(25) 


2  P^r) 
j=l  J  K 


where  is  the  number  of  ones  in  R.  The  expected  variance  of  the  total 
n 

score  will  be  2  P<(6D)Q<(0D) .  (Lord,  1980).  The  variance  of  2  ,u.  and 
J=1  J  K  j  K  rj=l  j 

2  u  are  2  S  (0  )(1  -  S  (0  ))  and  2  S  (0  )(1  -  S  (0  )). 
rj=0  J  ri=l  J  K  J  R  Ri=0  j  R  J  R 


n 

respectively.  Adding  the  two  sums  yields  2  S  .  ( 0_ )  ( 1  -  S,(0r,))  which  is 

j=l  j  R  j  R 

equal  to  expression  (10).  Therefore,  the  relationship  between  the  bug 
distribution  of  slips  away  from  rule  R  and  the  conditional  density 
functions  of  latent  class  R  is  as  sunmarized  below: 


I)  Both  have  the  same  likelihood  function  as  can  be  seen  in  Equation  19 

II)  Both  have  the  same  expected  variance  given  in  Equation  10. 

III)  The  expectation  of  the  conditonal  density  function  of  latent  class 
R,  fc(Xj  =  1 |R)  is  the  sum  of  the  number  of  ones  in  R  and  the 
difference  of  the  expectation  of  number  of  slips  changing  from  "0 
to  1”  and  that  of  "1  to  0". 


Interpretabilitv  of  estimated  parameters,  factors  and  clusters 


17 


analysis,  factor  analysis  and  latent  class  models  are  developed  for 


finding  several  groups  into  which  subjects  are  assigned  so  that  subjects 


belonging  to  the  same  group  are  more  similar  than  are  subjects  belonging 


to  different  groups.  However,  it  is  not  unusual  to  encounter 


difficulties  in  interpreting  the  estimates  for  the  psychological  models. 


clustered  groups  or  factors.  Unlike  most  psychological  models,  rule 


space  has  been  developed  by  emphasizing  the  importance  of 


interpretability  of  statistics  estimated  from  the  data. 


Tatsuoka  (1986,  1987)  expressed  task  attributes  involved  in  n  items 


by  a  binary  matrix  (called  Attribute  x  Item  matrix  in  which  the  element 


Q,  .  is  1  or  0,  and  1  means  that  item  j  requires  subtask  k  and  Q,  .  =  0 
kj  kj 


means  that  item  j  does  not  require  subtask  k.  If  students  use  two 


different  strategies  to  solve  the  items,  then  two  matrices  are 


constructed  with  different  sets  of  task  attributes  and  item  task 


vectors.  Figures  1  and  2  show  two  Attribute  x  Item  matrices  based  on 


two  distinctly  different  strategies  for  solving  fraction  subtraction 


Figures  1  &  2  about  here 


problems.  The  first  strategy  (Method  A)  is  to  solve  the  problems  by 


always  converting  a  mixed  number  (e.g.  3  1/4)  to  a  simple  fraction  (e.g. 


13/4)  and  adding  or  subtracting  the  two  fractions.  The  second  strategy. 


(Method  B)  involves  separating  the  whole  number  part  from  the  fraction 


part,  adding  or  subtracting  the  two  numbers  independently,  then 


combining  the  answers.  Method  A  requires  better  computational  skills 


while  Method  B  requires  deeper  understanding  of  the  number  system.  The 


borrowing  skill  is  not  required  by  Method  A,  while  it  has  an  important 


role  in  Method  B.  As  a  result,  erroneous  rules  resulting  from  borrowing 
skills  will  not  appear  in  students  using  Method  A.  but  they  will  often 


be  observed  in  those  who  use  Method  B.  Table  1  shows  0^,  f,  stability, 
slip  dispersion,  likelihood  of  latent  classes  by  Method  A  and  those  by 
borrowing  errors  when  Method  B  is  used.  The  interpretation  of  these 
classes  is  given  in  Appendix  I.  The  values  in  Table  1  used  the 

Insert  Table  1  about  here 

two-parameter  logistic  model  obtained  from  a  sample  of  size  N  =  543  and 
the  group  of  Method  A  users  are  identified  by  the  rule  space  diagnostic 
mechanism  by  Tatsuoka  (1986).  The  third  column  contains  the  number  of 
students  classified  into  each  of  18  latent  classes,  students  in  each 
class  being  diagnosed  as  having  the  source  of  corresponding 
misconceptions  as  described  in  Appendix  I.  Appendix  I  lists  the 
interpretation  of  error  types  of  18  latent  classes.  The  fourth  and 
fifth  columns  show  the  positions  of  the  classes  in  the  rule  space. 

Since  f  is  fairly  large  for  class  7  (1.48)  this  class  is  unusual,  so  the 
probability  of  observing  the  class  7  misconception  will  be  small,  while 
class  8  (f  =  -.16)  will  be  observed  often.  The  sixth  column  is  the 
number  of  slips  each  class  may  expect  to  have.  Class  5  has  a  mean 
number  of  almost  15  slips,  or  36.6%  of  the  40  items.  Therefore,  class  5 
represents  a  very  unstable  misconception  and  as  such  it  may  be  easier  to 
remediate.  Classes  1  and  12  are  in  a  fairly  stable  state  compared  to 
classes  5,  6,  7,  8,  and  30.  The  classes  located  in  the  neighborhood  of 
the  mean  0  value  of  the  group  tend  to  be  less  stable. 


iiiutimiuuiMUiiiiiiHuauv 
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Investigation  on  the  Conjecture 

The  conjecture  described  in  Equation  (20) 
union  set  of  K  +  2  latent  classes,  or  clusters 
and  .1  can  be  examined  by  a  Monte  Carlo  study, 
this  hypothesis  is  as  follows: 

Hypothesis:  Item  parameters  of  the  Method  A  dataset  in  Table  1  are  equal 
to  the  estimated  item  parameters  of  the  Monte  Carlo 
dataset,  generated  as  described  below. 

1)  Generation  of  the  Monte  Carlo  data 

a.  The  18  classes  listed  in  Table  1  are  used  in  this  study.  A  sample 
of  size  N  =  1000  was  generated  as  the  union  of  subsamples  with 
sizes  proportional  to  the  numbers  of  students  classified  into  the 
respective  classes  in  the  previous  dataset,  as  shown  in  the  third 
column  of  Table  1  (N  =  328). 

b.  Slip  probabilities  for  each  class  on  the  40  items  are  computed 
from  the  original  set  of  item  parameters  given  in  Table  2.  The 
estimates  of  parameters  were  calibrated  from  a  larger  sample 

(N  =  534),  including  the  Method  A  dataset  of  N  =  328  as  a  subset. 

2)  Method  of  Analysis 

a.  Principal  component  analysis  and  varimax  rotation  were  carried  out 
for  Method  A  sample  and  generated  data,  and  their  eignevalues  were 
examined . 

b.  IRT  parameters  of  Method  A  sample  were  calibrated. 

c.  IRT  parameters  of  the  generated  N  =  1000  sample  were  calibrated. 

2 

d.  Mean  square  error  was  computed  and  x  test  of  disparity  between 
the  two  sets  of  item  parameters  were  carried  out  (Lord,  1980, 
p.  223). 


that  an  IRT  sample  is  the 
around  rules  R^.-.R^,  0 
The  procedure  for  testing 


1 


I 


Results  of  the  analysis,  d,  indicated  that  the  two  sets  of 
estimated  item  parameter  values  a  and  b  —  one  from  the  actual  sample  of 
N  =  308  and  the  other  from  the  generated  dataset  of  N  =  1000  —  do  not 
have  significant  differences,  as  shown  in  Table  2. 

Insert  Table  2  about  here 

This  implies  that  the  conjecture  expressed  by  Equation  20  is  supported. 

The  analysis  a,  indicated  Method  A  sample  and  generated  data  are 
unidimensional  and  their  eigenvalues  (larger  than  1.0)  are  20.02,  2.17, 
2.03,  1.55,  19.82,  and  1.62,  respectively. 

Discussion 

This  study  demonstrates  that  IRT  and  latent-class  models  are 

algebraically  related  and  the  IRT  conditional  density  functions  of  the 

items  are  expressible  in  terms  of  those  of  latent  "null  state"  class. 

Bug  distributions  introduced  by  Tatsuoka  and  Tatsuoka  (1987)  are  here 

formulated  by  using  the  notion  of  slips  away  from  the  perfect  response 

patterns  R  representing  a  state  of  knowledge.  The  bug  distribution 

follows  a  compound  binomial  distribution  with  slip  probabilities  S.(0). 

J 

Sj(0)  equals  Pj (0)  if  Rj  =  0  and  Qj { 0 )  if  Rj  =  1.  If  a  bug  distribution 
is  expressed  by  the  conditional  probabilites  of  item  scores  given  0, 
then  it  can  be  considered  as  the  conditional  density  function  of  latent 
class  R.  The  traditional  latent  class  models  require  the  rather  strong 
assumption  of  statistical  independence  among  classes  and  further  the 
latent  classes  must  be  mutually  exclusive  and  together  exhaustive  in 
order  to  enable  formulation  of  likelihood  functions  and  estimation  of 


parameters.  Moreover,  the  underlying  foundation  of  latent  class  models 
assumes  that  each  state  of  knowledge  is  discrete  and  hence  not  mutually 
transferable  from  one  state  to  another.  However,  recent  advances  in 
cognitive  psychology  have  shown  the  learning  process  is  very  volatile 
and  students  change  their  hypotheses  or  theories  while  their  learning  is 
in  progress.  Therefore,  the  constraints  imposed  on  the  latent  class 
models  make  it  difficult  to  explain  theory  changes,  or  to  measure  change 
scores,  although  the  models  can  explain  cognitive  states  of  knowledge 
fairly  well  as  Paulson's  model  (1985)  does. 

Rule  space  respresentation  of  response  patterns  enables  us  to 
visualize  both  the  IRT  and  latent  class  models  in  a  Cartesian  Product 
space  of  0  and  the  value  of  a  linear  operator  f(x  ;  0  =  ( jP( 0 )  -  x. 

P(0)  -  T(0))  (Tatsuoka,  1985,  1986,  Tatsuoka  &  Tatsuoka,  1987).  In  this 
representation  0  plays  the  role  of  "glueing  two  contrasting 
psychological  models,  IRT  and  latent  classes  in  a  single  two-dimensional 
vector  space.  By  so  doing,  conceptualization  of  latent  states  of 
knowledge  and  a  continuum  scaling  of  latent  ability  0  becomes  much 
easier  than  thinking  in  the  abstract. 

Introduction  of  bug  distributions,  instead  of  the  traditional 
conditional  density  approach  of  latent  classes  have  made  it  easier  to 
derive  algebraic  relationships  between  IRT  and  latent  class  models  and 
made  it  possible  to  develop  a  general  model  of  rule  space,  which  is  an 
expansion  of  the  two  leading  psychological  models.  Consistency  and 
stability  of  rules,  or  state  of  knowledge  are  introduced  in  the  context 
of  distribution  theory  in  this  study.  However,  validation  of  these  new 
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notions,  characterize  each  erroneous  rule,  cognitive  error  or  latent 
class  requires  further  investigation. 

The  conjecture  raised  in  this  study  also  requires  further 
investigation.  This  study  showed  that  if  the  union  set  of  several 
latent  class  samples  satisfies  the  unidimensionality  condition  then 
their  first  eigenvector  in  principal  component  analysis  becomes 
collinear  with  the  first  eigenvector  of  the  IRT  sample.  In  other  words,  I 

t 

it  is  plausible  to  conjecture  that  the  centroids  of  several  latent  class 

samples  are  on  first  principal  axis  of  principal  component  analysis,  ; 

i 

then  the  IRT  model  will  also  fit  the  union  of  these  latent-class  j 

j 

samples.  Grounds  for  acceptance  of  this  conjecture  were  provided  only  ‘ 

by  a  Monte  Carlo  study  in  this  paper.  More  mathematically  rigorous  \ 

investigation  of  the  topic  is  needed. 

I 
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Table  2 

Estimated  Item  Parameters  of  Sample  A 

2 

and  Generated  Sample  and  \  for  Testing 
the  Null  Hypothesis  a.  =  a  '  b.  =  b  ' 


Method  A  Sample  Generated  Sample 
N=306  N=1000 


